AITopics | Al Rayyan

Collaborating Authors

Al Rayyan

Examining the Rat in the Tunnel: Interpretable Multi-Label Classification of Tor-based Malware

Karunanayake, Ishan, AlSabah, Mashael, Ahmed, Nadeem, Jha, Sanjay

arXiv.org Artificial IntelligenceSep-25-2024

Despite being the most popular privacy-enhancing network, Tor is increasingly adopted by cybercriminals to obfuscate malicious traffic, hindering the identification of malware-related communications between compromised devices and Command and Control (C&C) servers. This malicious traffic can induce congestion and reduce Tor's performance, while encouraging network administrators to block Tor traffic. Recent research, however, demonstrates the potential for accurately classifying captured Tor traffic as malicious or benign. While existing efforts have addressed malware class identification, their performance remains limited, with micro-average precision and recall values around 70%. Accurately classifying specific malware classes is crucial for effective attack prevention and mitigation. Furthermore, understanding the unique patterns and attack vectors employed by different malware classes helps the development of robust and adaptable defence mechanisms. We utilise a multi-label classification technique based on Message-Passing Neural Networks, demonstrating its superiority over previous approaches such as Binary Relevance, Classifier Chains, and Label Powerset, by achieving micro-average precision (MAP) and recall (MAR) exceeding 90%. Compared to previous work, we significantly improve performance by 19.98%, 10.15%, and 59.21% in MAP, MAR, and Hamming Loss, respectively. Next, we employ Explainable Artificial Intelligence (XAI) techniques to interpret the decision-making process within these models. Finally, we assess the robustness of all techniques by crafting adversarial perturbations capable of manipulating classifier predictions and generating false positives and negatives.

classifier, dataset, prediction, (14 more...)

arXiv.org Artificial Intelligence

2409.16639

Country:

Oceania > Australia > Western Australia > Joondalup (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

Computational analysis of the language of pain: a systematic review

Nunes, Diogo A. P., Ferreira-Gomes, Joana, Neto, Fani, de Matos, David Martins

arXiv.org Artificial IntelligenceMay-10-2024

Objectives: This study aims to systematically review the literature on the computational processing of the language of pain, or pain narratives, whether generated by patients or physicians, identifying current trends and challenges. Methods: Following the PRISMA guidelines, a comprehensive literature search was conducted to select relevant studies on the computational processing of the language of pain and answer pre-defined research questions. Data extraction and synthesis were performed to categorize selected studies according to their primary purpose and outcome, patient and pain population, textual data, computational methodology, and outcome targets. Results: Physician-generated language of pain, specifically from clinical notes, was the most used data. Tasks included patient diagnosis and triaging, identification of pain mentions, treatment response prediction, biomedical entity extraction, correlation of linguistic features with clinical states, and lexico-semantic analysis of pain narratives. Only one study included previous linguistic knowledge on pain utterances in their experimental setup. Most studies targeted their outcomes for physicians, either directly as clinical tools or as indirect knowledge. The least targeted stage of clinical pain care was self-management, in which patients are most involved. Affective and sociocultural dimensions were the least studied domains. Only one study measured how physician performance on clinical tasks improved with the inclusion of the proposed algorithm. Discussion: This review found that future research should focus on analyzing patient-generated language of pain, developing patient-centered resources for self-management and patient-empowerment, exploring affective and sociocultural aspects of pain, and measuring improvements in physician performance when aided by the proposed tools.

ehr, narrative, natural language processing, (14 more...)

arXiv.org Artificial Intelligence

2404.16226

Country:

North America > United States (0.14)
Europe > Portugal > Porto > Porto (0.05)
Europe > Portugal > Lisbon > Lisbon (0.05)
(4 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.88)

Industry:

Health & Medicine > Therapeutic Area > Rheumatology (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(5 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(3 more...)

Add feedback

Forcing Diffuse Distributions out of Language Models

Zhang, Yiming, Schwarzschild, Avi, Carlini, Nicholas, Kolter, Zico, Ippolito, Daphne

arXiv.org Artificial IntelligenceApr-16-2024

Despite being trained specifically to follow user instructions, today's language models perform poorly when instructed to produce random outputs. For example, when prompted to pick a number uniformly between one and ten Llama-2-13B-chat disproportionately favors the number five, and when tasked with picking a first name at random, Mistral-7B-Instruct chooses Avery 40 times more often than we would expect based on the U.S. population. When these language models are used for real-world tasks where diversity of outputs is crucial, such as language model assisted dataset construction, their inability to produce diffuse distributions over valid choices is a major hurdle. In this work, we propose a fine-tuning method that encourages language models to output distributions that are diffuse over valid outcomes. The methods we introduce generalize across a variety of tasks and distributions and make large language models practical for synthetic dataset generation with little human intervention.

biography, diversity, language model, (14 more...)

arXiv.org Artificial Intelligence

2404.10859

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
(22 more...)

Genre: Research Report (1.00)

Industry:

Government (0.68)
Energy (0.68)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Add feedback

Clinical translation of machine learning algorithms for seizure detection in scalp electroencephalography: a systematic review

Moutonnet, Nina, White, Steven, Campbell, Benjamin P, Mandic, Danilo, Scott, Gregory

arXiv.org Artificial IntelligenceApr-8-2024

Machine learning algorithms for seizure detection have shown great diagnostic potential, with recent reported accuracies reaching 100%. However, few published algorithms have fully addressed the requirements for successful clinical translation. For example, the properties of training data may critically limit the generalisability of algorithms, algorithms may be sensitive to variability across EEG acquisition hardware, and run-time processing costs may render them unfeasible for real-time clinical use cases. Here, we systematically review machine learning seizure detection algorithms with a focus on clinical translatability, assessed by criteria including generalisability, run-time costs, explainability, and clinically-relevant performance metrics. For non-specialists, we provide domain-specific knowledge necessary to contextualise model development and evaluation. Our critical evaluation of machine learning algorithms with respect to their potential real-world effectiveness can help accelerate clinical translation and identify gaps in the current seizure detection literature.

algorithm, detection, seizure detection, (16 more...)

arXiv.org Artificial Intelligence

2404.15332

Country:

Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > Middle East > Lebanon > Beirut Governorate > Beirut (0.04)
North America > United States > New York (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Epilepsy (0.97)
Health & Medicine > Therapeutic Area > Genetic Disease (0.97)

Technology:

Information Technology > Data Science > Data Quality > Data Transformation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Methods for generating and evaluating synthetic longitudinal patient data: a systematic review

Perkonoja, Katariina, Auranen, Kari, Virta, Joni

arXiv.org Artificial IntelligenceSep-21-2023

The proliferation of data in recent years has led to the advancement and utilization of various statistical and deep learning techniques, thus expediting research and development activities. However, not all industries have benefited equally from the surge in data availability, partly due to legal restrictions on data usage and privacy regulations, such as in medicine. To address this issue, various statistical disclosure and privacy-preserving methods have been proposed, including the use of synthetic data generation. Synthetic data are generated based on some existing data, with the aim of replicating them as closely as possible and acting as a proxy for real sensitive data. This paper presents a systematic review of methods for generating and evaluating synthetic longitudinal patient data, a prevalent data type in medicine. The review adheres to the PRISMA guidelines and covers literature from five databases until the end of 2022. The paper describes 17 methods, ranging from traditional simulation techniques to modern deep learning methods. The collected information includes, but is not limited to, method type, source code availability, and approaches used to assess resemblance, utility, and privacy.

publication, resemblance, synthetic data, (14 more...)

arXiv.org Artificial Intelligence

2309.1238

Country:

Europe > Finland > Southwest Finland > Turku (0.04)
Asia > Middle East > Qatar > Al Rayyan (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
(2 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task

Abdul-Mageed, Muhammad, Zhang, Chiyu, Bouamor, Houda, Habash, Nizar

arXiv.org Artificial IntelligenceNov-9-2020

We present the results and findings of the First Nuanced Arabic Dialect Identification Shared Task (NADI). This Shared Task includes two subtasks: country-level dialect identification (Subtask 1) and province-level sub-dialect identification (Subtask 2). The data for the shared task covers a total of 100 provinces from 21 Arab countries and are collected from the Twitter domain. As such, NADI is the first shared task to target naturally-occurring fine-grained dialectal text at the sub-country level. A total of 61 teams from 25 countries registered to participate in the tasks, thus reflecting the interest of the community in this area. We received 47 submissions for Subtask 1 from 18 teams and 9 submissions for Subtask 2 from 9 teams.

arabic natural language processing workshop, proceedings, subtask 1, (8 more...)

arXiv.org Artificial Intelligence

2010.11334

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Africa > Middle East > Djibouti (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(63 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback